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The semiconservative quasispecies equations in [27] 
were derived under the assumption of perfect lesion re- 
pair. Briefly, after replication has occurred, and both 
daughter genomes have been synthesized, it is possible 
that there are still mismatched base-pairs in the daugh- 
ter genomes which were not corrected by various error- 
correcting mechanisms of the replication process itself 
(two such error-correcting mechanisms are the built-in 
proofreading capabilities of the DNA replicases, and the 
mismatch repair pathway) [35]. Any remaining mis- 
matches will result in lesions along the DNA chain, which 
are recognized and repaired by various maintenance and 
repair enzymes present in the cell. However, after repli- 
cation has occurred, it is no longer possible to distinguish 
between parent and daughter strands, and so the lesion 
is correctly repaired with a probability of 1/2. 

In a recent work [30], Brumer and Shakhnovich studied 
the semiconservative quasispecies equations with imper- 
fect lesion repair. The authors postulated that imper- 
fect lesion repair may be necessary to reconcile the high 
point-mutation rates observed in certain cancers (the Mi- 
crosatellite INstability, or MIN, tumors) with semicon- 
servative replication. The argument stems from the fact 
that semiconservative replication is considerably less ro- 
bust to the effect of replication errors than is conserva- 
tive replication [27]. However, mutational robustness can 
be increased by reducing the efficiency of lesion repair. 
Imperfect lesion repair breaks the perfect correlation be- 
tween the parent and daughter strands, thereby allow- 
ing for better preservation of genetic information. Thus, 
semiconservative replication with imperfect lesion repair 
can behave more like a conservatively replicating system 
in certain cases [30] (we will make this statement more 
precise later in the paper). 

Subsequently, it was shown that imperfect lesion repair 
may also be necessary when modeling stem cell growth, in 
order to properly account for the effect of age-dependent 
chromosome segregation (known as the immortal strand 
hypothesis) [36]. Thus, it is apparent that imperfect le- 
sion repair may be necessary for a proper modeling of 
the evolutionary dynamics of many biologically impor- 
tant phenomena. 

Therefore, in this paper, we continue the work initiated 
in [27], and develop an extension of the semiconservative 
quasispecies equations which allows for arbitrary lesion 
repair probabilities. While the main results of this paper 
may be found in [37], the full details of the arbitrary 
lesion repair model are contained here. 

This paper is organized as follows: In the following 
section, we present the finite genome length quasispecies 
equations for arbitrary lesion repair. While we cannot 
convert the dynamics over the space of double-stranded 
genomes to the space of single strands, as was possible 
in [27], we can nevertheless make an analogous transfor- 
mation and convert the dynamics to the space of ordered 
strand pairs. In Section III, we go on to establish the in- 
flnite sequence length form of the equations for a class of 
fitness landscapes which are defined by a single, "master" 



genome. In Section IV, we explicitly solve for the equilib- 
rium behavior of a subclass of these landscapes, which we 
call a generalized single-fitness-peak landscape. We also 
tlctermine the critical mutation rate necessary for induc- 
ing error catastrophe for this class of fitness landscapes. 
In Section V, we explore the equilibrium behavior with 
specific examples, and discuss similarities and differences 
with both conservative and semiconservative replication 
with perfect lesion repair. We also present results from 
stochastic simulations of finite populations of replicating 
organisms, in order to corroborate the theory developed 
in this paper. Finally, in Section VI we conclude with 
a summary of our results and discuss plans for future 
research. 



II. THE FINITE SEQUENCE LENGTH 
EQUATIONS 

A. From double-stranded genomes to ordered 
sequence- pairs 

Double-stranded DNA consists of two complementary, 
antiparallel strands [27, 35]. Each DNA genome is de- 
fined by the pair of strands {cr, ct} = {a, a}, where a 
denotes the complement of a. If each base is drawn from 
an alphabet of size S (where 3 = 4 due to Watson-Crick 
pairing), and if bi denotes the complement of a base 6,, 
then if (J = 6i . . . 6i, we have, by the antiparallel nature 
of DNA, that ct = 6l...6i. 

The replication of a DNA genome {a, a} may be di- 
vided into three stages: 

1. Strand separation - The genome unzips to produce 
two parent strands, a and a. 

2. Daughter strand synthesis - Each parent strand 
serves as the template for the synthesis of a com- 
plementary daughter strand. 

3. Lesion repair after cell division. 

An illustration of semiconservative replication may be 
found in [27]. 

This replication mechanism leads to the semiconserva- 
tive quasispecies equations developed in [27]: 

= -{l^{a,a} + K{t))x^„^^] 

\p{a',[a,a})+p{a',{a,a})] (1) 

where a;{cr,o-} denotes the fraction of the population 
with genome {a, a}, denotes the first-order 

growth rate constant, or fitness, associated with genome 
{a, a}, p(cr', {a, a}) denotes the probability that the par- 
ent strand cr' forms the genome {<t, ct} after daugh- 
ter strand synthesis and lesion repair, and K{t) = 
J2{(T,a} '^{<T,s}^{(7,3} is the mean fitness of the population. 
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When lesion repair is imperfect, the correlation be- 
tween the two strands is broken, and we must consider 
a more generalized dynamics over genomes of the form 
{a, a'}, where both a and a' are arbitrary. Following the 
derivation in [27], we obtain the quasispecies equations, 

= 

+ ^2 '^{(T",(T"'}^{(T",(r"'} ^ 

{(t",(t"'} 

[p{{a", a'"), {a, a'}}) + p{{a"\ a"\ {a, a'}})] 

(2) 

Here, p{{(j" ,(t"'),{(j,(t'}) denotes the probability that 
parent strand cr", as part of genome be- 
comes genome {(J, cr'} after daughter strand synthe- 
sis and lesion repair. In addition, we have R{t) = 
X^l^// ^///j. K{cr",cr"'}2:{(7",cr"'}- The definitions are other- 
wise unchanged from the original semiconservative equa- 
tions. 

In the semiconservative quasispecies equations, the 
complementarity property allows one to convert the qua- 
sispecies dynamics over the space of double-stranded 
genomes into an equivalent, and considerably simpler, 
dynamics over the space of single strands [27]. With 
imperfect lesion repair, the lack of perfect correlation 



between the two strands in the genome makes a con- 
version to a single strand model impossible. Neverthe- 
less, we can make an analogous transformation of the 
dynamics, from double-stranded genomes {ct, ct'} to or- 
derered pairs of strands, {a, a'), as follows: We define 

y{a,a') = = ^X{a,a'} if Cr ^ a', and = 

Also, we define K(^a,a') = i^{a',a) = i^{a,a'}- We then have 
that, 

'^{a",a"'}^{a",a"'} X 

{u",a"'} 

[p((a", a'"), {a, a'}) a"), {a, a'})] 

= 2 \'i{a",a"')y{a",a"')P{{cr",Cr"'),{(^^(^'}) 

+K(a"',a")y{a"',a")P{{(^"', Cr"), {(J, a'})] 

+2 X] f^{T",a")y{a",a")Pi{cr",(^"),{(^^(^'}) 

{<t",<t"} 

= 2 ^ K(a",<7"')2/(a",<7"')P((c^"5C^"05{cr,Cr'}) (3) 
{a",a"') 

Finally, we define p{{cr", cr'"), {a, a')) to be the proba- 
bility that a", as part of genome {a", cr'"}, becomes a, 
with daughter strand a' (after daughter strand synthesis 
and lesion repair). Then it follows that. 



P\SP ,0 ),\a,a p((<7",(7"'),((T,(7')) if (7 = (7' 



For o' 7^ (7, we therefore obtain that, 
dt ~ 2 dt 

= -{l^ia,a') + l^{t))y{a,a') 

+ ^ K(^a",a"')y{a",a"') X 
(<t",ct"') 

[p{{a'\ a'"), (a, a')) + p{{a'\ a'"\ (a', a))] 

(5) 

The same equation holds for y{aa)i since 
2p(((T",a"0,{a,a}) = p{{a".a'"):{a,a)) + 

p{{a",a"'),{a,a)). Therefore, the quasispecies dy- 
namics over the space of ordered sequence-pairs is given 
by, 

= -(«(a,cr') + K(0)2/(cr,cr') 

+ ^ ,a"')y{a" ,a"') X 

{a", a'") 

[p((a", u"'\ (a, a')) +p((a", u"'\ (a', a))] 

(6) 



As a final derivation in this subsection, we will obtain 
an equivalent formulation of Eq. (6) which will prove 
useful later. To begin, suppose that the fitness land- 
scape is such that k^^ ^0 = i^{a a') ■ Furthermore, suppose 
we have that p{{a" ,W"),{a,^')) = p{{(j" ,a"'),{a,a')). 
Then, if our population is initially lesion-free, we claim 
that y(a^a') = y(a,<7') at all times. 



To see this, note first that a lesion-free population is 
equivalent to the statement that y{a,a') = if cr' 7^ a. 
Then if a' 7^ a, it certainly follows that a' ^ a, hence 
y{a,a') = = y(rj^(j')- On the other hand, if a' — a, then 
y{a,a') = y(a',a) = y{a,a')- Therefore, a population which 
is lesion-free satisfies the property that y(a,a') = y(<T,a') 
for all sequence pairs (cr, cr'). 



Then in order to prove that y(a,a') = 2/(ct,o-') at all 
times, we need only show that y(a,a') = y{<T,a') at some 
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time t implies that dy^^^^i^/dt = dy^fj^^i^/dt. We have, 



B. Determination of p{{(t" , cr'"), (cr, u')) 



+ X] ^{p" ,a"')V(a" ,a"') X 
(<t",o-"') 

O, ^0) + P(('^", O, ('^', ^))] 
[p((a", O, ^0) + O, C'^', ^))] 

+ X] '^(ff",o-"')2/(o-",(T"') X 

[p((<7", a'"), (<T, a')) +p((<t", a'"), (a', a))] 



dt 



(7) 



which estabhshes our claim. 

So let us assume that our fitness landscape is such 

that = K(o-,ct')' ^^"-^ ^^^^ P{{^'\ (S', ct')) = 

p(((t", (j'"), (cr, (j')). If our population is initially lesion- 
free, then for all sequence pairs (cr, a') we have y(a,a') = 
y{a,a'). This gives, 

+ i^{'T"p"')y('T"p"')P{{'^"^^"')^{^^^')) 

{a",a"') 

+ l<'{a",a"')y(a",a"')P{{^",^"')A<7',(^)) 
{a",a"') 

= -{K{a,a') + l^{t))y{a,a') 
{a",a"') 

+ X] H''"p"')y(a",a"')P{W,(^"')-,{^'^^)) 

{a",a"') 

(8) 

which can be simplified to give, 

= -(/^Ka') + '«W)y(a,a') 

+ X/ '^(ff",o-"')J/(ff",o-"') X 

((7",t7"') 

a'"), (a, a')) +p{{a", u"'\ {a', a))] 

(9) 

We will make use of these equations when considering 
the behavior of the quasispecies dynamics in the limit of 
infinite genome lengths. 



We now compute p{{(7",a"'),{a,a')), assuming that 
with each genome {(7, a'} there is a base-pair indepen- 
dent mismatch probability, denoted by ^{a^a'}-) a 
base-pair independent lesion repair probability, denoted 
by A{(j,ct'} (the genome dependence of the mismatch and 
lesion repair probabilities arises from the fact that dif- 
ferent genomes may code for different enzymes, or none 
at all, that are involved in DNA repair. See for instance 
[24-26]). 

We begin with some definitions: Define uc to be the 
subsequence of bases in a which are complementary with 
the corresponding bases in a'. That is, suppose a = 
6i . . . 5l, and suppose for indices i\ < i2 < ■■■ < ik we 
have that bi- = b'i^_^._^-^. Then ac = bi^ .. .bi,,. We also 
define cr^ to be the subsequence of corresponding bases 
in cr', so that a'c = b'^^i^^i ■ . Finally, let ct^ 

denote the subsequence of bases in a" corresponding to 
the bases in ac, so that = 6" . . . b'-^ . 

Now, define crjvc to be the subsequence of bases in 
a which are not complementary with the corresponding 
bases in cr'. That is, given the complementary indices 
h < 12 < • • • < ik defined above, let i[ < i2 < ••• < i'^-k 
be the remaining indices. Then a^c = ^i'j • • • • We 
define a'j^^ to be the subsequence of corresponding bases 



in cr', so that cr',, 



b'r 



'NC - • • • Finally, wc 

let o'j^Q denote the subsequence of bases in a" corre- 
sponding to the bases in crjvc, so that a'Lr = b'L ■ ■ ■ b'', 

We now let p{{a" , a""); a'") denote the probability 
that cr", as part of genome {a", a'"}, is paired with 
a"" during daughter strand synthesis. We also let 
p((cr", cr"") -> (cr, cr'); cr'") dcnotc thc probability that a" 
becomes a and cr"" becomes cr' during lesion repair (the 
presence of the a'" in this notation is to indicate that a" 
comes from genome {a", a'"}. Presumably, the enzymes 
involved in lesion repair are the ones that came from the 
original parent cell, hence the lesion repair probability 
should be X^^i^^^n^y). Then we have that, 

p{{a",a"%{ay)) = ^p((a", a""); O x 

p{{a",a"")^{a,a');a"') 

(10) 

Consider some base b" in cr", and suppose that 6" is part 
of a'c- If b'- differs from the corresponding base bi in ac, 
then it is clear that during daughter strand synthesis, 
it must be paired with bi, and during lesion repair it is 
6" that must be repaired to form bi. Therefore, if Ic = 
Dh{<7'c,'^c) denotes the Hamming distance between Gq 
and ac, then 6" must be paired with bi, and the {b'l,bi) 
lesion must be repaired to {bi, bi), in Ic places. The prob- 
ability of mispairing a given 6" with bi is Cj^" ,o-"'} / ("5— 1) ■ 
The probability of lesion repair is X{(j".(t"'}- Finally, as- 
suming lesion repair occurs, the probability of repair- 
ing b'l is 1/2. Assuming that cr"" is chosen to satisfy 
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the pairing requirements described above, we obtain a 
factor of {X{a" a"'}^{a" o-"'}/2(S' — 1))''^ contribution to 
p{{a",a""y,a"')p{{a^',a"") ^ {a, a'): a'"). 

Now, let Lc denote the length of ac, so that ct^ 
and ac are equal in Lc — lc positions. Then, given 
some b'l in one of these Lc — lc positions, it can be 
paired with any other base. Let lc,i denote the number, 
among these positions, where 6" is mispaired with a 
base other than 6-' = 6j. Then, among these Lc — lc 
positions, b" is paired with b" in Lc — lc — lc,i positions. 
Since lesion repair must happen in lc,i positions, then 
for an appropriately chosen a"", wc have a factor of 
(A{a",a"'}e{a",a"'}/2(S - (1 - e{^,,,„,,,})^^-^^-'^.i 
contribution to p{{a" , a""); a"')p{{a" , a"") 

Finally, let ijvc denote the length of ctjvc- Since 
ujsic and a'j^/c ^ot complementary, no lesion re- 
pair can happen at positions in cr^^-^. Therefore, cr'^c 
cannot be changed, hence we must have a'^ffj = a^c- 
Also, a mismatch must occur at all sites along g'^q 



to form the corresponding bases in cr^c". Once again, 
for an appropriately chosen a"", we have a factor 
of ^<^a«c((l - Mu",u"'}y{u",u"'}/{S - con- 
tribution to p{{u",u'"y,u"')p{{u",u"") {u,u');u'"). 
Therefore, given a daughter strand a"" for which 
((j", a"") can become (cr, a') after lesion repair, we have. 



p{{a",a"")-a"')p{{a"y"') ^ (a, a'); a'") = 

^«c^«cV 2(5- 1) ^ ^ S-1 ' 



2(5-1) 



(11) 



To evaluate the sum in Eq. (11), we need only sum 
over those a"" for which p{{g" ,G"");G"')p{{a" 
{a, a'); a'") is nonzero. Thus, we sum over all possible 
values of lc,i, taking into account degeneracies for each 
value of lc,i- This gives. 



Piia", a'"), (a, a')) = ^p((a", a""); a'")p((a", a"") ^ (a, a'); a' 

■',<t"'}^{<t" 

2(S-1) 



"o-''„crjvcV o/c 1^ ^ V 



'NC 

Lc-lc 



5-1 



OaXcf^jvcl 2(5-1) ^ '^''(1 -e{<7",<7"'}(l 2 

(12) 



Note that Lc = L — L^c, and note that since L^c is simply the number of positions where a and a' are not 
complementary, it follows that L^c = DH{cr,a'). Therefore, our final formula is. 



p((a",a'"),(a,a')) = 



2(5-1) 

A/ 



5-1 



(1 _ e{,„^^,„y(l - ^i-"^-"'} y)L-D^{<^,a')-D^(,a'i,ac) 



(13) 



For the remainder of this paper, we will assume that 

e{o-,cr'} and X{(j,a'} are genome independent, and hence 
may be denoted by e and A (unless otherwise indicated). 



cr' ^ cr, then. 



dt 



(14) 



1-^ /-M,* • • 4-u \ 1 • i- 4-- This implies that genomes with lesions will eventually 

C. Obtaining the A = 1 semiconservative equations r i i i r i 

disappear from the population, liurthermore, it an initial 



When A = 1, it follows that p((cr",cr'"), (cr,cr')) 



population of genomes is lesion-free, then no lesions will 
appear in the population, hence in such a case we may 



(^cto-'PCCct", ct'"), (cr, ct)), since, with perfect lesion repair, take y(cr.a') = for 7^ ^r, and restrict our dynamics 
all post-replication lesions are removed. Therefore, if to the space of complementary ordered sequence-pairs. 
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denoted {{a, a)}. We then have 



is turned off, we have p((cr", a'"), {a, a')) = Sa"aPi(^, c')- 
This gives, 



dt 



[p((a',a'),(a,a))+p((a',a'),(a,a))] 



dt 



(15) 



a" 

+ X! l^(a',a")yia',a")P{(^', Cr) 



(19) 



Now, for A = 1 note that. 



Now, define j/o- = y((j,a') ■ Also, define 
Ecr' i^{a,a')y{a,a')/ya- We then have. 



^2(S-1)' 



(1 



Note then that since DH{cr',a) = DH{cr',a), we have 
that p{{5', cr'), {a, a)) = p{{(t', ct'), {a, a)), and so. 



dy{o,s) 



+l^aya 

+ ^l^cj'ycj'P{o',o) 
a' 

X] i^o'ya'vW , a) - R{t)y„ 



(20) 



dt 



+ XI '^(cr',a')?/(cr',a')P(('^': ^'): (C^; ^)) 
+ X] '^{'f',a')y(a',<T')P{{^', ct'), (^, O-)) 

(7' 

-(«(a,a) + Kt))y{a,<i) 

+2 X '«(t7',a')?/(cr',a')P((c^'> ^)) 



(17) 



where we have used the fact that E<t' p{'^^ — 1- 

Note that we have transformed the scmiconservative 
quasispecies equations into a set of equations that look 
like the conservative equations, the key difference being 
that the fitnesses Kq- are concentration-dependent. How- 
ever, it is possible to show that when the fitness depends 
on only one of the strands, then the conservative equa- 
tions are obtained exactly (this will be done later in the 
paper). 



Defining y„ = y{„^a}, <^u = (■{a,a), and = K(^a,a) gives, 
-(kct + iiit))ya 



dya 
dt 



111. THE "MASTER" GENOME FITNESS 
LANDSCAPE 

A. Infinite sequence lengtli equations 

A¥c will now develop the infinite sequence length cqua- 
QgAtions for a class of fitness landscapes defined by what 
we call a "master" genome {(jQ.ao}. A subclass of these 



which are exactly the original semi conservative equations 
derived in [27]. 



D. A = equations 

Before concluding this section, we will show that when 
lesion repair is turned off, then the semiconservative qua- 
sispecies equations can be transformed into equations 
which are sinular in form to the conservative quasispecies 
equations. This is essentially a rederivation of a result of 
Brumer and Shakhnovich [30], done with our sequence- 
pair formalism. 

For this derivation, we make the assumption that 
e{o-.CT'} is a constant e for all genomes. This implies that 
p{{(T , a""); a'") docs not depend on a'", hence the term 
may be dropped from the notation. Since lesion repair 



landscapes is a generalization of the single-fitness-peak 
landscape [3, 27], which is the simplest landscape for 
which analytical results are obtainable. We will solve 
for the equilibrium mean fitness and the error thresh- 
old associated with this class of landscapes in the next 
section. 

Before proceeding, we note that the infinite sequence 
length equations arc taken with ji = Lt held constant. 
Because the probability of correct daughter strand syn- 
thesis is (1 — e)^ e~'' as L ^ 00, holding /i constant 
amounts to fixing the genome replication fidelity in the 
limit of infinite sequence length. 

The "master" genome {ctq, ctq} gives rise to the ordered 
sequence pairs (o"o,o"o) and (ctojCTo). In the limit of in- 
finite sequence length, it is possible to show that, with 
probability 1, the sequences ctq and ctq become infinitely 
separated from each other, i.e. Dh{c^Oj^o) oo [27]. 
Thus, we may regard {(7o,ao) and (ctcCTo) as infinitely 
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separated from each other in the ordered sequence pair 
space. 

The infinite separation between ctq and ctq allows a di- 
vision of the sequence pairs into three classes. A sequence 
pair (a, a') is said to be of the first class if Dh{(J, ctq) and 
Dh{(^' , ^q) are both finite. A sequence pair {a, a') is said 
to be of the second class if Dnicr, (Tq) and Dnicr', ctq) are 
both finite. Finally, a sequence pair not belonging to ei- 
ther one of the first two classes is said to belong to the 
third class. Using the Triangle Inequality, it is readily 
shown that a sequence pair cannot belong to more than 
one class. 

A given sequence pair {a, a') of the first class can be 
characterized by four parameters, denoted Ic, II, Ir, and 
Ib- The first parameter, Ic, denotes the number of posi- 
tions where a and a' are complementary, yet differ from 
the corresponding positions in (Jq and ctq, respectively. 
The second parameter, l^, denotes the number of posi- 
tions where a differs from ctq, but the complementary 
positions in a' are equal to the corresponding ones in ctq. 
The third parameter, Ir, denotes the number of positions 
where a is equal to the ones in ao, but the complemen- 
tary positions in a' differ from the corresponding ones in 
Ctq. Finally, the fourth parameter, Ib, denotes the num- 
ber of positions where a and a' are not complementary, 
and also differ from the corresponding positions in ctq 
and Ctq, respectively. A sequence pair {a, a') of the sec- 
ond class may be similarly characterized (except uq and 
uq are swapped in the definitions given above). 

We assume that the fitness of a given sequence pair 
of the first class is determined by Ic, II, Ir, and Ib, 
hence we may write that K,^a,a') = /^{IcIlMb}- The 
fitness of a sequence pair {a, a') of the second class is 
determined by noting that {a', a) is of the first class, and 
that K(a,cr') = i^{a',(7)- We take the third class sequence 
pairs to be unviable, with a first-order growth rate of 1. 

We also assume that K{Ic,Il,Ir,Id) = HIc,Ir,Il,Id)- This 
is a natural assumption to make if one assumes symmetry 
between the two master strands ctq and ctq. This assump- 
tion also implies that K(^^,a') = i^(a,a')- To see this, let 
us first suppose that (cr, cr') is of the first class, and is 
characterized by the parameters Ic, II, Ir, and Ib- Then 
{a' , a) is also of the first class. Because taking the com- 
plement of a sequence essentially amounts to a relabelling 
of the bases defined by a one-to-one map, and to a re- 
versal in the sequence direction, it follows that {a, a') is 
a sequence pair of the second class, characterized by the 
parameters Ic, II, Ir, and Ib- Therefore, is char- 

acterized by the parameters Ic, Ir, II, and Ib, and so 

If (ct, a') is of the second class, then (cr', cr) is of the first 
class. We then have K(^s,s') = i^(s',s) = i^(a',a) = i^(a,a')- 

Finally, if (cr, a') is of the third class, then using the 
identity Dh {^1,^2) — -Dff(o"i)<72) we can show that 
(ct, a-') is also of the third class. Therefore, = 

1 = «(t7,(T')- 

Based on our formula for p{{a", cr'"), (cr, cr')), we have 
that p{{a",a"),{a,a')) = p{{a",a"'),{(j,a')). This r(> 



suit again follows from the fact that taking the comple- 
ment of a sequence essentially amounts to a relabelling 
of the bases, and a change in the direction that the se- 
quence is read. Thus, all Hamming distances in Eq. (13) 
are unchanged. 

Therefore, with this choice of landscape, and with a 
genome-independent e and A, we have, assuming that 
our quasispecies population is initially lesion-free (which 
is done by taking 2/(^0,^0) = y(ao,ao) = 1/2, for instance) 
that y(a,a') = y(a,a'), and so Eq. (9) applies. 

In the limit of infinite sequence length, we claim that 
we may treat the quasispecies dynamics about the "mas- 
ter" pairs (cro,CTo) and (^ccrg) independently of one an- 
other. That is, we may treat the dynamics as arising 
from essentially two separate quasispecies living on sep- 
arate fitness landscapes. The heuristic reason for this 
is as follows: Because gq and gq become infinitely sep- 
arated from each other in the limit of infinite sequence 
length, it follows that if a given cr" is of finite Hamming 
distance to either uq (or ctq), then after daughter strand 
synthesis and lesion repair we obtain a (cr, cr') which is of 
finite Hamming distance to (cro,(Jo) (or {gq-gq)). This is 
because the i)robability of making an infinite number of 
replication mistakes is zero, and so if cr" is of finite Ham- 
ming distance to gq, then it remains so after replication, 
and its complement is also of finite Hamming distance to 

Ctq. 

We allow our system to come to equilibrium from the 
initial condition y^ao.ao) — Uiao.ao) = 1/2 (equivalent to 
^{CTo,ffo} = 1- We choose this initial condition because it 
guarantees convergence to the unique stable equilibrium 
solution of the model. The reason for this is that all 
genomes are mutationally accessible from {cro,CTo}. Be- 
cause of the neglect of backmutations in the limit of in- 
finite sequence length, other initial conditions may lead 
to different regions of the genome space becoming muta- 
tionally disconnected from each other, preventing proper 
equilibration from occurring). 

Thus, because our genome distribution is initially lo- 
calized about (erg, Gq) and (ctq, gq), we need only consider 
the local dynamics about each master sequence pair, and 
treat the dynamics about each pair separately. 

We now claim that, for sequence pairs {g,g') of the 
first class, y{a,a') depends only on Ic, II, Ir, and Ib- We 
note that this certainly holds at t = 0, given our initial 
conditions. In order to prove that this holds at all times, 
we need to show that, if y(a,a') depends only on Ic, II, 
Ir, and Ib at some time t, then dy(^^„i^/dt depends only 
on Ic, II, Ir, and Ib- In doing so, we will be simulta- 
neously deriving the dynamical form of the quasispecies 
equations appropriate for our choice of fitness landscapes. 
We should note that our "proof" will not be strictly rig- 
orous, since it will consider finite sequence length equa- 
tions while still assuming that the first class and second 
class sequence pair dynamics may be treated as separate 
quasispecies. Nevertheless, since we are passing to the 
limit L ^ 00, we can assume that L is sufficiently large 
to make the correction terms to our equations negligible. 
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and eventually 0, in the limit. 

So, suppose that at some time t, for all sequence pairs 
(cr, cr') we have that y(a,a') depends only on Ic, lit Ir, 
and Ib. Then we may write = y(ic,iL,iR,iB)^ and 

so, Eq. (9) gives, 

^y(lc,lL,lR,lB) _ I li/j.^^,, 

^ - -KI^{lc,lL,lR,lB)+I^Kt))yilc,lL,lR,lB) 

+ ^2 '^{<r",<r"')y{<r",('"') ^ 

(<t",o-"') 

+ X/ '^(a",a"')y{a",a"') X 
{a",a"') 

p((a",a"'),(a',a)) (21) 

We proceed as follows: Given and crc, then among 
the subset of positions where ctc and ctq arc identical, let 
lc,i denote where ct^ differs from uc- Among the subset 
of positions where gc and ctq differ, let lc,2 denote where 
a'l; is identical to gq. Finally, where uc differs from ctq) 
let /c,3 denote the number of positions where cr^ differs 
from both gq and gc- It is clear that Dh{g'^,gc) = 



lc,i + lc,2 + ^c,3- Furthermore, to have a nonzero value 
of p(((7",a'"'), (cr, cr')), we must have g'j^u = gnc- Since 
the sequence pair (ct, g') consists oi II +Ir + Ib lesions, 
it follows that L^c = Il + Ir + Ib-, giving. 



p((a",0,(^,^')) = ( 



Ac 



2(5-1)' 



^2 _ g(l yjL — lL—lR — lB—lc,l~lc,-i—lc,3 ^ 



( 



5-1 



Il+Ir+Ib 



(22) 



We now need to characterize the g'": Where g" differs 
from (7o, let Z" denote the number of sites where g" and 
g'" arc complementary, and I2 the number of sites where 
g'" is non- complementary to g" but differs from gq. Let 
Zg denote the number of sites where g" is identical to ctq, 
where g'" is non-complementary to g" . Then we have, 
I'c = ^1 ' = ^c,i + lc + Il + Ib — lc,2 — I'l — I'-i, Vr = I'i, 



and l'^ 



1" 

(2- 



We define C"{Ic,iJc,2,Ic,3;Ic,IlJrJb) to be the 
number of g" characterized by lc,i, lc,2, and lc,3- We 
have. 



C"{lc,, , lc,2, lc,s; IclL, iR, Ib)={^-''^- - (/^^J ('^ '^^') {S - 1)^-^ (5 - 2)'-3 (23) 



where Zc,i ranges from to L — II — Ir — Ib — Ic, lc,2 ranges from to Ic, and Zc,3 ranges from to Zc — lc,2- 

For each such choice of g", we define C"'{li, I2, l'^; Ic, II, I-rJb, I-c,!, ^0,2, ^0,3) to be the number of g'" characterized 
by Z'/, Z^', and Z^'. We have. 



1" 1" 1 1111 1 1 \ (Ic,i + Il + Ib+Ic-Ic,2\(Ic,i + Il + Ib + Ic-Ic,2-Ii\ 

<^ (.Zi,Z2,t3,Zc,ZL,Zfl,ZB,Zc,l,Zc,2,Zc,3j = \ ^„ ^„ |X 



— lc,i — II — Ib — Ic + Ic, 



1" 

'3 



^ (5 -2)'^ (5 -1)^3 

(24) 



We may perform a similar analysis on ((7',cr), which is characterized by the parameters Zc, Z^j, Z^, and Ib- The 
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quasispecies equations then become 

dy(lc,lL,lR,lB) 



dt 



L—Il—Ir—Ib—Ic Ic lc—lc,2 Ic,i~^^l+Ib+Ic—Ic,2 'ci+'^+'b+'c— 'c,2— ^" L—Ic,i—Il—Ib—Ic+Ic,2 

+ E E E E E Ex 

;c,i=0 lc,2=0 lc,3=0 V{=0 i'^=o ig=o 

C"{lc,i ! lc,2,lc,3\ Ic, II, Ir, Ib)C"'{Ii, I2, 1's', Ic, l-L, Ir, Id, l-c,i, lc,2,lc,3) X 

Hl'{,lca+lL+lB + lc-lc,2-l'{-l'i,l'^,l'i)y{l'{,lc,l+lL+lB + lc-lca-ll-l2^l'3M) ^ 

(• y.C.-L+lc.2+lc.3(^ '=-^^~'^ y.L+lR + lB(^l _ g^^J _ ^y^L-l l-Ir-Ib-Ic ^'Ic ,2-lc,3 

+ E E E E E Ex 

(0,1=0 ic,2=0 ic,3=0 «'i'=0 «^'=0 «^'=0 

C"{lc,i,lc,2,lc,2\ Ic, Ir, II, Ib)C"'{Ii, I2, 13; Ic, Ir, Il,Ib, lc,i, lc,2,lc,2) x 

'^(ii',ic,l+iii+«B+ic-ic,2-«i'-?J.',/^',«^')2/(;i',«C,l+iK+«B+ic-ic,2-«i'-«^',i^',i^') ^ 

'2i{S 1) S 1 2 

We now use the binomial theorem and sum over Ic^, giving, 

,Il,I'R,I'b) I . -/+W 

= -{l^ilc,lL,lH,lB)+ l^\t))yilc,lL,lH,lB) 

L—Il—Ir—Ib—Ic Ic Ic,i+Il+Ib+Ic—Ic,2 Ic,i+Il+Ib+Ic— 10,2—11 L—Ic,i—Il—Ib—Ic+Ic,2 

+ E E E E Ex 

ic,i=0 lc,2=0 l'{=Q l'^=0 l'^=0 

L — II — Ir - is — lc\ f Ic \ flc,i + h + Ib + Ic — lc,2\ flc,i + II + Ib + Ic — lc,2 — l" 

ic,i JVcJK i'( )\ q 



L — lc,i -Il-Ib — Ic + lc,2 
I" 



'*^(«i',ic,l+?L+iB + ic-ic,2-«i'-«^',«^',«^')2/(«'l'>'c,l + «L+«B+ic-«C,2-«'i'-«^',i^',i^') ^ 

C^'^Vci/ \lc.2l ''(^~ ^) \Il+Ir+Ib y 

^2^ ^2{S-iy ^ S-1 ^ 

(1 - e(l - ^■^■^L-l,-lR-l,-lo-lo,i (1 _ ,(1 _ ^(1 _ _±_-)^yc-lc,2 

L—Il—Ir—Ib—Ic Ic Ic,i+Ir+Ib+Ic—Ic,2 Ic,\+Ir+1 b+Ic —Ic ,2—l'{ L—Ic,i—Ir—Ib—Ic+Ic,2 

+ E E E E Ex 

«c,i=o /c,2=o i'^=o iq=o i'^=o 

L-Il-Ir-Ib- lc\ f lc\ flc,i +Ir + Ib+Ic- +Ir + Ib+Ic- lc,2 - I'l 

lc,i )\lc,2)\ l'( )\ I'i 



X 



L — lc,i — Ir — Ib — Ic + lc,2 



I" 



{S - 1)^3 {S - 2)^2 X 



'^(l'{,lc,l+lR+lB+lc-lc,2-V{-l'i,l'^,l'^)y(V{,lc,l+lR+lB+lc-lc,2-l'{-l'.l,l':i,l'.^^ ^ 

(-^^yc.i( -^^ yc.2(^ ''0-~ -^^ yL+iR+iB x 
2 ^(^S — S — 1 

(1 - e(l - ^^^L-i,-iR-iB-ic-ica (1 _ ,(1 _ _ ^^^^)))'--'--^ (26) 

Note that this expression depends only on Ic, II, Ir, and Ib, hence our claim that y{a,a') only depends on Ic, II, 
Ir, and Ib is established. We now proceed to formally take the L — > oo limit. 

Because y{a,a') depends only on Ic, II, Ir, and Ib, we can sum over the population fractions of all first class sequence 
pairs characterized by a given set of Ic, II, Ir, and Ib, and reexpress the quasispecies dynamics in terms of these 
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quantities. To do this, we define C{Ic,Il,IrJb) to be the number of sequence pairs characterized by Ic, II, Ir, Is, 
and note that, 

Cilc, IM.)= _ fj _ ,^ _ ,^ _ ,^), (5 - - 2)- (27) 

Wc let z^Ic,Il,Ir,Ib) denote the total population fraction of first class sequence pairs characterized by Ic, II, Ir, and 
Ib, so that z^i^^ij^^ij^^ig) = C{lc,lL,lR,lB)y(ic,iL,iR,iB)- this expression, and using the fact that C(/^.,/^,ij^,i^) = 

^{Ic,Il,Ir,Ib)i reexpress the quasispecies dynamics in terms of the Z{Ic,Il,Ir,Ib)- After some algebra, the final 

result is, 

d'^(ic,iL,iR,iB) _ I . _i_ 

- -[l^ilc,lL,lR,lB)+ '^W)HIc,Il,Ir,Ib) 

L—Il—Ir—Ib—Ic Ic Ic,i+Ii^+Ib+Ic—Ic,i Ig,i+Il+Ib+Ic—Ic,i—1'{ L—Ic,i—Il—Ib—Ic+Ic,2 

+ E E E E Ex 

lc,i=0 lc,2=0 ;'i'=o l'^=0 l'^=0 

Il + Ib + lc,i +lc — l'C:2 \ (Il + Ib + lc — lc,2 \ (Il + Ib^ 

lc,i A Ic- lea J V I. 

L — Il — Ib — lc,i — h + l'C,2\ (L-Il — Ib — lc,i — 

lc,2 J \ Ir 

«(ii',ic,l+'L+iB+ic-ic,2-ii'-i^',ig,i^')^('i'.'c,l+'L+«B+ic-ic,2-ii'-i^',i^',i^') 

L—Il—Ir—Ib—Ic Ic 'c,i+'k+'b+'c— 'c,2 Ic,\+Ir+Ib+Ic—Ic,2—1'{ L—Ic,i—Ir—Ib—Ic+Ic,2 

+ E E E E Ex 

ic,i=o ic,2=o (7=0 iq=o iq=o 

Ir + Ib+ lc,i + lc- lc,2 \ {Ir + Ib + Ic- IcA (Ir + Ib^ 

h.i A Ic - lc,2 A In 

L — Ir — Ib — lc,i — Ic + lc,2\ fL — Ir — Ib - lc,i — Ic^ 
lc,2 )\ Il 



- e(l - ^))L-i,-iR-iB-ic-ic,^^i _ ,(1 _ _ ^^^^))) 



Ic—lc,: 



X 



'^{l'{,lca+lR+lB+lc-lc,2-l'{-l'4,l'i,l'4)^{l'{,lca+lR+lB+lc-lca-ll-l2 

Now, it may be shown in the limit of infinite sequence length that only the /c,i =0 terms contribute to the sum. 
This corresponds to the neglect of backmutations in the limit of infinite sequence length. The proof that > 
terms may be neglected is fairly tedious, but is similar to the arguments given in [25, 27]. Therefore, we do not give 
details in this paper. Regarding the remaining terms, we may note that, 

/L-Il-Ib-Ic + lc,2\ (^)'c,2(i _ g(i _ ^^^L-Il-Ir-Ib-Ic _^ J^(:^yc,2g-Ki-i) 
\ lc,2 / 2 2 Zc,2! 2 

(^-^-7j--^^)(e(i-A)y^^^(Mi-A)y^ 
(i_,(i_A(i-^^^))yo-^o.2^i 

Il+Ib+Ic- IcA [Il + Ib\ , e(l - A) (5 - 2) ^, . e(l - A) , ^ 



Ic-lca A I. )^ S-1 --(^y--W^..o (29) 

The last statement implies that genomes with Ib > and genomes with l^, In simultaneously > cannot 
be produced by replication. Therefore, if our initial population distribution is such that Z(^Ic,Il,Ir,Ib>o) — ^^'^ 
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^(Ic,Il>o,Ir>o,Ib) = (as is the case with our initial conditions), then we may assume that z^Ic,Il,Ir,Ib>o) = ™d 
Z(Ic,Il>o,Ir>o,Ib) = at all times. 

Putting everything together, we obtain the final form of the infinite sequence length equations. 



dz, 



iic,o,o,o) 
dt 



dt 



Jt 



-(K(ic,OAO) + l^{t))z(lc, 0,0,0) 

i^=o ^' l'{=0 11^=0 

l^(l'{,lc-l'o-li,l'ifi)Hli,lc-l'o-li,l'4fi) 



- {l^Hc, I L, 0,0) + K{t))z^ic^i^^o,0) 

1 A '"^ 1 A / °° 

+— (//(l-A))'^e-^(i-^) Y 7m(y)'^ E E'^W'.'c-ifc-i'i',«^',o)^W'.ic-ifc-ii'.i^',o) 

i'c.=o ^' i'{=o q=o 

-(HIc,0,Ir,0) + Ht))Hlo,0,lR,0) 

+ — (M(l-A))'«e-'^(l-^) ^ 7m (^y^ E 12''(li'io-Va-l'^,l'^fl)ZiV^,lo-li.-l'^,V^fi) 

l'^=0 ^ l>^=0 iq=0 



(30) 



The reason why genomes where both II and Ir are 
nonzero, or where Ib is nonzero, cannot be produced 
by replication, is as follows: Given a parent strand a 
which differs from ao in / places, the probability of 
correct daughter strand synthesis in these I places is 
(1 — e)' = (1 — I-i/LY — > 1 as L — > 00. Therefore, any 
mismatches that occur will occur whore a and ctq are 
identical. Wherever lesion repair does not occur, a re- 
mains identical to uq in the final genome. The result is a 
sequence pair for which Ij^ = 1^ = 0. Similarly, a parent 
strand a which differs from ctq in a finite number of po- 
sitions produces a sequence pair for which 1^ = 1^=0. 
Therefore, as is reflected in the equations, it is impos- 
sible for replication to produce sequence pairs for which 
II and Ir are simultaenously nonzero, or for which Ir is 
nonzero. Since the population fractions of these genomes 
is initially 0, they remain for all time, hence we may 
simply assume that •2(Zc,ij:,,ii?,iB) = if Zl and Ir are si- 
multaneously nonzero, or if Ir is nonzero. 

These equations describe the quasispecies dynamics for 
the first-class sequence pairs. An analogous set of equa- 
tions may be derived for the second-class sequence pairs, 
where we let ^(?c,iL,ii?,iB) denote the total population 
fraction of second-class sequence pairs characterized by 
the parameters Ic, II, Ir, and Ir. Note that a sequence 
pair (cr, a') is of the second class if and only if (a', a) is of 
the first class. Therefore, since y(a,a') = y{a',a)-> it follows 

that Zq^,Il,Ir,Ib) = HIc,Ir,Il,Ib)- 

We can provide an expression for the mean fitness in 
terms of the z^i^^i^^i^^i^y First note that, since the total 
population fraction of the third class sequence pairs is 1 — 

Eoo / I — 

lc=0 1-iIl=0 ^Ir=0 I^Ib=0^^(I'C,Il,Ir,Ib)~^ ^(Ic,Il,Ir,Ib))^ 



it follows that the mean fitness is given by, 

oo oo oo 'OO 

= E E E E X 

Ic=01l=0 Ir=0 Ib=0 

i'^(lc,lL,lR,lB)Hlc,lL,lR,lB) + 

'^{Ic,Ir,Il,Ib)^{Ic,Il,Ir,Ib)) + 

oo CO oo oo 

1-EEEEx 

lc=0 Il=0 Ir=0 Ib=0 
{HIc,Il,Ir,Ib) +HIc,Il,Ir,Ib)) (^1) 

For our particular class of fitness landscapes, for which 
we have y(a,a') = y{a,a'), we get y(ic,iR,iL,iB) = y(3',3) = 

y(a,a') = y(lc,lL,lR,lB)^ and so Z(ic^i^^ij^^i^) = Z(i^^ij^^i^^ij^y 

This allows us to reexpress the expression for the mean 
fitness as, 

oo 'OO 'OO oo 
= 2 E E E E i'^{lc,lL,lR,lB)-'^)^{lc,lL,lR,lB)+'^ 

lc=0 Il=0 Ir=0 Ib=0 

(32) 

IV. SOLUTION OF THE GENERALIZED 
SINGLE-FITNESS-PEAK LANDSCAPE 

A. Equilibrium mean fitness and the error 
catastrophe 

The simplest and most commonly studied landscape 
in the quasispecies model is known as the singlc-fitness- 
peak landscape. For the single-stranded RNA genomes 
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modeled in the original quasispecies equations, this land- 
scape is defined by a "master" genome ctq with a first- 
order growth rate constant fc > 1, while all other genomes 
have a first-order growth rate constant of 1. Thus, the 
master genome is said to be viable, while all the other 
genomes are unviable. 

For semiconservatively replicating, double-stranded 
DNA genomes, the single fitness peak landscape is de- 
fined by a "master" genome {ctcCTq}. When convert- 
ing the quasispecies dynamics from the space of genomes 
to the space of single strands, the resulting single fit- 
ness peak landscape becomes a two-peak landscape with 
"master" sequences ctq and ctq. 

For imperfect lesion repair, it is therefore also natural 
for us to first study the single-fitness-peak landscape. In 
this section, instead of considering a single-fitness-peak 
landscape where any change to the "master" genome 
{cro,CTo} results in an unviable genome, we consider a 
"generalized" single fitness peak landscape, where the 
master genome can sustain a finite number of lesions and 
remain viable. 



In the limit of infinite sequence length, the /-lesion 
landscape may therefore be defined as follows: For se- 
quence pairs of the first class, we define i^{Ic,Il,Ir,Ib) — 
k > 1 if /c = and II + Ir + Id < otherwise 
i^{Ic,Il,Ir,Ib) ~ The landscape of sequence pairs of 
the second class is of course defined by the landscape for 
sequence pairs of the first class, via K(ct.o-') = K'(a',a)- -A-U 
sequence pairs of the third class are unviable. 

Because we may make the assumption that 
Z(Ic,Il,Ir,Ib) = if 7^ or if both II and Ir 
are nonzero, we need only consider 2(ic,iL,o,o) and 
^(Ic,o,Ir,o)- Furthermore, by the symmetry of our 
landscape we have Z(^i^ j,^o,o) = Z(icfi,i',o)- 

We define the following quantities for use in our calcu- 
lations: 



Ir=1 



^(0,0,0,0) + 2 ^ 2(o,o,r,o) 
l'=l 



I 

X] ^(0,0,i',0) 
oo 



Z2 = Xl^(0'0,i',( 



.0) 



(33) 
(34) 
(35) 



We then have, from Eq. (30), that 

'^'2(0,0,0,0) 



dt 



dzi 
~dt 

dZ2 

~dt 



-(A; + K(t))z(o,o,o,o) 
+2e-''(i-t)[(fc-l)zi+Z2] 
-{k + R{t))zi 

+e-/^(i-i)(l + /,(^,A))[(fc-l)zi+Z2] 
-R{t)z2 



+(e-'^^ 



l)[{k-l)zi+Z2] 

(36) 



where fi{^i, A) = E^'^o V\ [^(1 " -^)]''- 

Now, note that zo = 5;(o,o,o,o) + 2(2:1 - 2(o,o,o,o))- Fur- 
thermore, note from Eq. (32) that R{t) = k{2zo) + (1 — 
2zo) = 2{k — l)zo + 1. Setting the left-hand side of Eq. 
(36) to 0, we may systematically eliminate variables to 
obtain. 







Zl 



R{t = 00) - (e-'^i + e-f'(i-t) - 1) 
{R{t = oof - A{n, X)R{t = 00) - B(/i, A)) 



(37) 



where. 



A{fx,X) 

B{fi,X) 



M(l + /KM,A))e-"<i-^)-l) 
-/i(/(i,A)e-''(i-3)+e-"^ -1 

1) 



(38) 



Eq. (37) admits multiple solutions. To determine the 
physical solution at a given yU, we note that we want R{t = 
00) = for ;U = 0. This simply refiects the fact that when 
replication is error-free, the population consists entirely 
of viable genomes. Therefore, for sufficiently small fi, the 
equilibrium mean fitness is given by, 



R{t: 



00 



A(/i,A) + V^(/i,A)^ + 4^(/i,A) 



(39) 



The equilibrium mean fitness is given by this expres- 
sion until the error catastrophe, which occurs when the 
value of R{t = 00) given by the formula above equals 
1. At this point, the selective advantage for remaining 
viable is no longer sufficiently strong to localize the pop- 
ulation about the viable genomes. The fraction of viable 
genomes drops to 0, and the fitness of the population 
simply becomes the fitness of the unviable genomes. 

Setting R{t = 00) = 1 in Eq. (37), it is possible to 
show, after some manipulation, that the critical value of 
/i, denoted ficriti is the solution to the equation. 



k + 1 



l'=0 



-^i k{2 + fi{fi,X))-fi{fi,X) 



(40) 
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V. RESULTS AND DISCUSSION could reconcile semiconservative replication with the high 

mutation rates observed in many cancers (specifically the 
A. Behavior of the model for specific values of I Microsatellite INstability, or MIN, tumors), 
and A 



1. 1 = 

When I = 0, our fitness landscape corresponds to a 
single fitness peak landscape which tolerates no lesions. 
Wc have /o(yU, A) = 1, so, as fc — > oo, we have, at the 
error catastrophe, that 



e-Mi-i) 1 
3 



e 



I' 2 



(41) 



When A = 0, we obtain e"^"''** =1/3, while when A = 1, 
we obtain e~'^=''^*/^ = 1/2. Note that, after daughter 
strand synthesis and lesion repair, the probability that a 
given parent base is matched up with the proper daughter 
base is given by 1 — e + (A/2)e. The reason for this is that 
correct base pair synthesis occurs with probability 1 — e. 
A mismatch occurs with probability e, which is correctly 
repaired during lesion repair with probability A/2. In the 
limit of infinite sequence length, the probability of correct 
daughter strand synthesis then becomes limi^oo(l— 
A/2))i = c;-''(i-V2). 

Note then that for A = 0, the critical daughter strand 
synthesis probability is lower than for A = 1. The rea- 
son for this is that when lesion repair is turned off, the 
parent strands are unaffected by the replication process, 
and hence the information in the master genome is pre- 
served by replication. Thus, although viability may be 
lost through erroneous replication, preserving the infor- 
mation in the parent strand makes it possible to recover a 
viable genome in the next replication cycle. The result is 
a delay in the critical replication fidelity to a lower value 
of e''-^""* than would be expected if it were assumed that 
unviable genomes cannot replicate into viable ones (the 
expected value from such an assumption is 1/2). 



I : 



For I = oo, our fitness landscape is one where only one 
of the "master" strands is necessary to confer viability. 
In this case, wc have fooipi-,^) — e^^^'^~^\ which gives 
A{^,X) = B{ii,X) — 1. Below the error catastrophe, we 
therfore have, 

R{t = oo) = B{fi, A) = k{e-i'-^ + e-f'^i-^) - 1) (42) 

Note that for A = we obtain R{t = oo) = fce"'^. Thus, 
when lesion repair is turned off, we obtain an effectively 
conservatively replicating system. The error catastro- 
phe in this case can be delayed to arbitrarily high muta- 
tion rates by increasing the replication rate of the viable 
genomes. This result was first derived by Brumer [30], 
and led to the hypothesis that imperfect lesion repair 



B. Similarities to both conservative and 
semiconservative replication 

Semiconservative replication with imperfect lesion re- 
pair bears a number of similarities and differences with 
semiconservative replication with perfect lesion repair 
and to conservative replication. We have shown earlier 
that the original semiconservative equations are obtained 
when A = 1. Furthermore, we have also shown that when 
lesion repair is turned off, then if the fitness depends on 
only one of the strands, a semiconservatively replicating 
population becomes an effectively conservatively replicat- 
ing one. For arbitrary lesion repair probabilities and for a 
given maximum lesion value /, it is interesting to explore 
what feature from both semiconservative and conserva- 
tive replication are retained. 

There are two key differences between conservative and 
semiconservative replication which we will explore here. 
First of all, for the single fitness peak landscape, the 
equilibrium mean fitness of a conservatively replicating 
system below the error catastrophe is ke~^, which gives 
fJ-crit = In k. For a semiconservatively replicating system, 
we have an equilibrium mean fitness of A;(2e-^^/2 _ 
which gives ficnt — 21n2/(l-|-l/fc). Note that, as — > oo, 
Merit ^ 00 for a conservative system, while for a semi- 
conservative system, fXcrit 2 In 2. Thus, for a con- 
servatively replicating system, the error threshold can 
be pushed to arbitrarily high mutation rates by making 
the growth rate of the master genome arbitrarily large. 
For semiconservative replication, in contrast, there is a 
maximal value to the error threshold. If the mutation 
rate exceeds this value, then no quasispecies will exist, 
independent of the growth rate constant of the viable 
genomes. 

The reason for this difference in behavior is that con- 
servative replication preserves a copy of the original 
genome. Therefore, no matter how high the mutation 
rate, by replicating fast enough, it is possible to produce 
viable genomes at a sufficient rate to out-replicate the un- 
viable genomes, and thereby localize the population to a 
well-defined quasispecies. With semiconservative replica- 
tion, the original genome is destroyed by the replication 
process. Therefore, on average, it is necessary for a viable 
genome to produce at least one viable copy per replica- 
tion cycle. Otherwise, the net growth rate of the viable 
genomes becomes negative, and replicating faster simply 
kills off the viable population more quickly. 

For arbitrary lesion repair probabilities, wc can deter- 
mine the value of ficrit in the limit of ^ oo by solving. 



e-Ki-i) 



1 



2 + /K/^,A) 



(43) 
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which may be rearranged to give, 

{2 + fl{^l,X))e-^'^^-i'^ +6-"^ -2 = (44) 

When /i = 0, fi{fi,\) = 1, so the left-hand side evalu- 
ates to 2. When fi = oo, the left-hand side evaluates 
to lim^^oo A)e-''(i-^) + e''^^ - 2). For finUe I, 
fi{fi,\) is a polynomial, hence we obtain a limit of —2 
for A > 0, and a limit of —1 for A = 0. Therefore, by 
the Intermediate Value Theorem, Eq. (44) has a solu- 
tion, and so even as k ^ oo, ficrit remains finite. This 
means that, for all finite /, scmiconservative replication 
with arbitrary lesion repair is similar to the original semi- 
conservative model in that there is an upper limit to the 
mutation rate before the error catastrophe occurs, inde- 
pendent of the growth rate of the viable genomes. 

If Z = oo, then fi{ji,X) = e'^^^~'^\ which gives 
lim^^oo 2(e~''2 — 1). When A > 0, this limit is —2, so 
again picrit remains finite. When A = 0, note that Eq. 
(44) evaluates to 2e~^ = 0, which has no solution for 
finite /i. 

Therefore, unless A = and Z = oo, semiconservative 
replication with arbitrary lesion repair also has an upper 
bound to the mutation rate before the error catastrophe 
occurs. This makes sense, because, to ensure that after 
replication at least one of the daughter genomes is viable, 
it is necessary to prevent lesion repair from creating an 
unviable genome, and it is necessary to prevent lesions 
from destroying viability. 

The second feature of semiconservative and conserva- 
tive replication which we will consider has to do with the 
behavior of K{t = oo) near the error catastrophe. To 
make matters concrete, define Kequii{^ = K{t = oo), and 
let us consider the behavior of Kgg„j^(/i) for fi — > /i^^j. 

For conservative replication, K'^quuil-i) = —ke'^^, so 
lim^^^- i-i'equiiil^) — For semiconservative repli- 
cation, <g„^i(/i) = -fce-f/^, so lim^^^-.^K;3g„^;(/i) = 
— (fc + l)/2. As fc — > 00, this derivative goes to — oo. In 
Appendix A, we will show that unless A = 1, KeguiiCA*) 
remains finite as fi ^ l^criv assuming I is finite. 

In this sense, then, imperfect lesion repair is similar to 
conservative replication. The reason for this behavior is 
that, when lesion repair is imperfect, the correlation be- 
tween the parent and daughter strands is broken. There- 
fore, it is possible that an erroneous daughter strand is 
synthesized, but that the errors are not communicated 
to the parent strand. On a subsequent replication cy- 
cle, the undamaged parent strand may be reintegrated 
into a master genome. Near the error catastrophe, where 
the efffective growth rate of the viable genomes is close 
to that of the unviable genomes, this effect slows down 
the rate at which the fitness decreases, leading to a finite 
value for K'^guuit^crit)- 

Interestingly, for / = oo, the derivative at the error 
catastrophe becomes infinite as ^ oo for A > 0. 
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FIG. 1: Compaiison of theory and simulation results for I = 0. 
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FIG. 2: Comparison of theory and simulation results for Z = 1. 



C. Stochastic simulations 

In order to compare the results of our theory with ac- 
tual numerics, wc ran stochastic simulations of finite pop- 
ulations of replicating organisms. Specifically, we deter- 
mined R{t = oo) at various values of // for / = (Figure 
1), / = 1 (Figure 2), and / = oo (Figure 3). We con- 
sidered genomes of length 40, and populations of 1, 000 
organisms. Our results were obtained by averaging over 
10 independent runs, where each run consisted of 10, 000 
time steps of size 0.01. We took k = 10. 
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FIG. 3: Comparison of theory and simulation results for I = 
oo. 
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Note the excellent agreement between theory and sim- 
ulation. One interesting feature to note is that for / = 1, 
the A = 1 fitness is slightly larger than the A = fitness 
for almost all /i below the error catastrophe. However, 
the A = 1 error catastrophe happens before the A = 
error catastrophe, consequently, there is a region where 
the A = fitness becomes greater. We give a possible 
explanation for this phenomenon: Below the error catas- 
trophe, it is advantageous to maintain the highest repli- 
cation fidelity possible, which is done by maximizing the 
lesion repair efficiency. A tolerance of one lesion is not 
sufhcient to provide a selective advantage for inefhcicnt 
lesion repair. However, when the A = 1 error catastrophe 
is reached, at ficrit ~ 2 In 2, then lesion repair no longer 
reduces the error rate by a sufficient amount to avoid the 
death of the population. At this point, it becomes ad- 
vantageous to turn lesion repair off. With lesion repair 
turned off, any replication mistakes that are made remain 
in the daugher strand. Thus, the parent strand is pre- 
served, and since the master genome can tolerate some 
lesions, it is still possible to produce a viable genome. 
On a subsequent replication cycle, the unchanged parent 
strands can be reintegrated into a master genome. 

As described above, the result of these competing ef- 
fects is that the A = 1 fitness is greater than the A = 
fitness almost until the A = 1 error catastrophe. How- 
ever, just before the A = 1 error catastrophe, the mean 
fitness switclics, and the A = catastrophe happens after 
the A = 1 catastrophe. 

We should note that at ficrit ^ 2 In 2 = In 4, an av- 
erage of about 1 mismatch is made per daughter strand 
synthesis. Thus, without lesion repair, the tolerance of 
a one-base lesion in the master genome is just sufficient 
to preserve viability at the A = 1 value for ^crit- As 
the tolerance for lesions grows, the selective advantage 
for turning off lesion repair even below the error catas- 
trophe increases as well. Eventually, for I — oo, when 
lesion repair is turned off we obtain conservative repli- 
cation, which has a higher fitness than semiconservative 
replication at all mutation rates. 

VI. CONCLUSIONS AND FUTURE RESEARCH 

This paper developed the quasispccics equations suit- 
able for describing semiconservative replication with im- 
perfect lesion repair. The work presented here may be 
regarded as a continuation of the work in [27], which 



provided the quasispecies equations for semiconservative 
replication, under the assumption of perfect lesion re- 
pair. Wc solved the model for a genealized "single-fitness- 
peak" landscape where the master genome can sustain a 
finite number / of lesions and remain viable. For future 
research, it will be interesting to consider the behavior 
of the model for more realistic landscapes. Specifically, 
we would like to explore the behavior of the model when 
a genome is viable even for positive values of Ic- In the 
original semiconservative quasispecies equations, a fitness 
landscape which allows for a finite number of point muta- 
tions before loss of viability does not delay the occurrence 
of the error catastrophe beyond what is predicted in the 
singlc-fitness-peak model [22]. We expect this result to 
change when lesion repair is imperfect. 

Furthermore, we plan to apply the quaispecies model 
with imperfect lesion repair to stem cell growth, explicitly 
incorporating the "immortal strand" segregation mech- 
anism. Due to the nature of stem cell division and tis- 
sue development, such a model moves beyond the simple 
model of genomes rephcating in a chemostat [22, 38], 
where each genome has an equal probability of being 
removed from the population. Indeed, wc will need to 
develop a further extension of the imperfect lesion re- 
pair quasispecies equations, using techniques from what 
is known as evolutionary graph theory. 
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APPENDIX A: DETERMINATION OF 

To evaluate Kegitii(M) arbitrary lesion repair, we 
start with the fact that for fi < fJ.crit, i^equuifJ') satisfies, 

= KequiliP'f - A{lJ., X)Heqml{lj) - B{fi, A) (Al) 

Differentiating both sides gives, 

= 2KequUHequil ~ d^AKequil — ^'^eguii ~ ^/^-^ (^2) 

When /i = ^icrit we have KequU = 1, giving. 



^equil 



-k 



e-^^(i-|)((l _ |)(2 + /,(^, A)) - (1 - A)/,_i(m, A)) + |e-^ 



-/'(i-t) _, 



-fc((l + /^(;U,A))e-''(i-t)-l) 



+ - 



((1 



)/Km, A) - (1 - X)Mfi,X))e-^^'-^^ - fe-^t 



3 + fl{^J.,X)e 



-Mi-t) 



k{{l + fi{fi,X))e 



-/^(i-t) 



1) 



(A3) 
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where we used the identity 9,, //(/y, A) = {l — X)fi-i{fi,X). 

Note that the numerator of the first fraction is pos- 
itive for > (where we are neglecting the fac- 
tor of —k), so as ^ cx) and /j l-^crit, if (1 + 



"2^— 1 = 0, then k' 



equil 



-oo. Con- 



versely, if {l+fi{ficrit, A))e-f'="*(i-^/2) -1^0, then the 
denominator ensures that the derivative remains finite. 

Now, at the error catastrophe, we have A{ficrit->X) = 
1 — B{ficrit,X). Therefore, plugging into our expres- 
sion for k'^ijuU: we get that K'^q^u is infinite if and 
only if e"'*'^"*-'^/^ + e-''='-«(i--'^/2) -1 = 0. However, 
we have shown that K'^q^u is infinite if and only if 
(l+/KMcr,t,A))e-'^-^*(i-V2)_i = 0. Therefore, if 

is infinite, then we must have fiip-crit- X) — c''^''»'(^~'^) = 
/oo(/ticrrt, A). For finite I, note that fiificrit, X) < 
fcoifJ'criu A), with equality only when ficriti'^ - A) = 
A = 1. 

Therefore, for finite Z, lim^^^^^.j K'^qutiiP') remains fi- 
nite as A; ^ 00 as long as A < 1. 

When I = 00, then Kequuifj) = fc(e-^^^/2_^e-^*(i--^/2)_ 
1) below the error catastrophe. It is readily shown that, 
except for A = 0, the derivative at the error catastrophe 
becomes infinite as A; ^ oo. 



is run out to some prespecified time T at time steps of 
some prespecified At. We try to choose T large enough to 
obtain good equilibration of the population, and At small 
enough so that one can reasonably make a continuous 
time assumption. 



At each time step, we cycle over each organism in the 
population, and determine whether it replicates in that 
time interval. The replication probability P{a,a'} of 
organism with genome {cr,a'} may be computed from 
the first-order growth rate constant in one of two ways: 

P{a,a'} = min{K{^^„,}At, 1}, or P{a,a'} = 1 - e'^i"'"'}'^*. 

In practice, we choose At to be sufficiently small so that 
the two definitions yield almost identical results. 



APPENDIX B: NOTES ON THE 
IMPLEMENTATION OF THE STOCHASTIC 
SIMULATIONS 

Stochastic simulations arc run using a finite population 
of A'^ replicating genomes of length L. The simulation 



If an organism replicates, then it is effectively de- 
stroyed, and it produces two new organisms. At the end 
of each replication cycle, we randomly remove organisms 
from the population until the population size returns to 
N. 
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